home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Internet Info 1997 December
/
Internet_Info_CD-ROM_Walnut_Creek_December_1997.iso
/
ietf
/
find
/
find-minutes-96jun.txt
< prev
next >
Wrap
Text File
|
1996-10-07
|
4KB
|
95 lines
Editor's note: These minutes have not been edited.
0. Agenda review/changes
The proposed agenda was accepted without changes.
1. Why two parallell CIP drafts ?
Patrick explained that he and Roland shared the view that Chris
Weider's draft didn't reflect the consensus of the group reached at the
LA meeting and also had too much whois++ stuff in there. Therefore a
second draft was produced by by Jeff Allen and Patrik Faltstrom. The
intended outcome of this is that these two drafts will be merged into
one.
2. Charter of the find group
There where some discussion about which papers were going to be
produced. The consensus was that there should be one document
specifying the CIP, another one specifying how to use centroids as one
special case of indexes within the CIP and further for each client -
server protocol that is goint to use the CIP one paper describing the
mapping between the data representations and one describing the access
method.
3. LDAP/CIP work at Umea University
Roland Hedberg presented the work he has been doing to enable a X.500
DSA to work as an index server and he also presented a WWW-
interface that can use this index server.
The WWW-interface can be reached at
http://macavity.umdc.umu.se/~roland/query2.en.html and the
indexserver it accesses contains all the information presently accessable
in the Swedish branch of the X.500 DIT (~50.000 entries). For the time
being the index only contains names of people. Roland will produce a
draft describing the objectclass and attributes needed to ackomplish
this .
4. The new CIP draft
Jeff Allen presented the gist of the new draft. The discussion following
the presentation led up to a couple of unresolved items:
The use of MIME - should/can INDEX-CHANGED be structured as a
MIME message Aggregation ala CIDR - facilitate query routing.
Incremental updates - per application domain or general. Security -
both regarding exporting indexes and data protection. Centroid scaling
issues - certain datasets only contain unique items which means that
the resulting index is no smaller than the original dataset. Frontends to
indexservers might only speak one access protocol - clients speaking
another access protocol can not pass this server, while climbing the tree
upwards or downwards, which means that parts of the mesh might be
unaccessable to the client.
5. Workshop of Distributed Indexing and Searching
Erik Selberg presented some ideas on using query routing within the
Web indexing sphere which came out of the workshop . It was felt that
introducing query routing and distributed index servers is a necessary
step in the development of the Web indexes since the current centric
approach doesn't scale. More info on the workshop can be found at
http://www.w3.org/pub/WWW/Search/9605-Indexing-Workshop/
It was agreed that followup work undertaken by the query routing
contingent from the Distributed Indexing/Searching Workshop would
be folded into the FIND working group.
6. The CIP and CCSO
Martin Hamilton presented his work on integrating CCSO nameservers
with the CIP. His conclusion was that it was viable but that there
remained some items that have to be resolved. There is no standard
URL format for a CIP referral to a CCSO nameserver. For the time being
Martin proposed that one could use the gopher one
(gopher://ccso.server.domain.name:105/2).
Another question is whether the CCSO should the CCSO attribute
names and types be normalized to a common schema.
7. Scaling of the CIP
Patrik presented some graphs showing the relationship between the
size of a centroid and the size of the actual datasets both when looking
a people informations from the phonebook and large document
collections. Phonebook information revealed the not very astonishing
fact that phonenumbers are unique which means that the centroid
increased almost linearly with the growth of the dataset. Removing
phonenumbers from the centroid gave a much slower growth and it also
appeared to be asymptotic. When indexing words out of documents the
curve didn't seem to level off when the dataset grew ( max dataset size
~12.000.000 tokens). When applying a stop list weeding out very
frequent words and very unusual words the curve became asymptotoic,
reaching 60.000 and levelling off to be leveling of at that value.