NetNews Usenet Archive 1992 #16

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #16 / NN_1992_16.iso / spool / alt / gopher / 1115 < prev next >

Wrap

Internet Message Format | 1992-07-29 | 5.4 KB

Path: sparky!uunet!cis.ohio-state.edu!magnus.acs.ohio-state.edu!usenet.ins.cwru.edu!agate!ames!bionet!snorkelwacker.mit.edu!thunder.mcrcim.mcgill.edu!sifon!peterd From: peterd@cc.mcgill.ca (Peter Deutsch) Newsgroups: alt.gopher Subject: Re: index for all of gopherspace Message-ID: <1992Jul29.153142.2281@sifon.cc.mcgill.ca> Date: 29 Jul 92 15:31:42 GMT References: <1992Jul27.180509.27470@mercury.unt.edu> <1992Jul27.201122.17017@msuinfo.cl.msu.edu> <1992Jul28.194722.19492@nstn.ns.ca> Sender: news@sifon.cc.mcgill.ca Organization: Bunyip Information Systems (the archie people) Lines: 103 Nntp-Posting-Host: expresso.cc.mcgill.ca In article <1992Jul28.194722.19492@nstn.ns.ca> daniel@nstn.ns.ca (Daniel MacKay) writes: >Hello! . . . >Peter Deutsche writes: First, a little administriva note - that's "Deutsch", not "Deutsche". Think of me as a noun, not an adjective... :-) >> While talking about indexing Gopherspace, the next release of archie >> will allow gathering of arbitrary collections of information and ... >> [...] >> and we plan to put >> in pointers to the various Gopher servers as a proof of concept. > >I was describing my idea to a visitor to my office, and he suggested >exactly the same thing- there be a file with a well-known-name available on >every gopher that is authoritative for some resource. This file would have >a wordy and search-rich description of the resources, i.e. an abstract for >the resource written in a way that it will be full of keywords that people >are likely to use when they're looking for it. The one for the recipies >database would have words like "recipies" "food" "cooking" (but would it >contain an entry for "dessert" "entree"? Hmm.) Yup, this sounds like an IAFA-like template for each site admin to fill in, with liberal use of the "Keywords" and "Comments" fields. I like this approach because it is easy for admins (so they'll do it) and has enough structure to make searching a lot easier. >Peter's, or someone's, robot could sweep through the gopher servers >periodically, collect the files, and build the Index Into Gopherspace. > >My point: there are not *that* many things in gopherspace once you take >out all the redundancy. If we only collect descriptions from people who >are authoritative for their resource, the problem dwindles into something >quite doable- think of how much smaller Peter's archie database would be if >there was only one entry in it for every ftp'able resource! This is exactly what we are saying about anonFTP, too. I believe we have already reduced duplication to some extent with archie, since people now have some expectation that they can again find something, they no longer feel quite so inclined to store old copies. Of course, there is still a lot of "pack-ratting" going on. To address this completely we really need unique document identifers and resource serial numbers deployed to allow the tools to detect and eliminate the duplicates. Work is going on through the IETF to get these defined and deployed ASAP. >Anyway, it results in an entity that's like a couple of things: > a) Peter's "whatis" project. > b) My gopherizing of the SRI NISC List-of-Lists, available on the > nstn.ns.ca gopher; check it out as > Internet Resources > Mail Lists > Mail List Subject Search > >The important thing is that the *name* of something usually doesn't tell >you much about it. So a robot indexing all the words of the menu items it >finds in gopherspace is relatively useless. Someone carefully writing a >description of their resource, keeping in mind the kind of keywords a user >might use when looking for it, is *much* better. Yeah, writing it is work. >But- garbage in, garbage out. If you're authoritative for a resource, >presumably you should be able to take the time to describe the resource >once. Again, this is exactly the same problem we found with our anonFTP experience and was a driving force behind our work with IAFA. Once we have a standardized way of encoding information, the tools can be deployed to index and search them (given that we now have archie, WAIS, etc they're already written, at this point). >Peter continues: >> Now, the missing link here seems to be a simple way to get the index >> info out of Gopher. >I don't think that's much of a problem. *The* thing that gopher's best at >is delivering documents. And I think it's quite reasonable to have a >Well-Known-Directory containing descriptions of the data for which that >gopher is authoritative, (e.g. me and CA*Net news, Canadian Weather). > >On a related topic- Peter, does that mean that the archie data and the >whatis data will be integrated? I had a complaint from a user the other >day that my gopher/archie gateway didn't deliver whatis data, and I >thought- right! Why doesn't it? Well, the info templates we gather will automate the maintenance of the "whatis" database, but I believe that the reason you can't get to the current archie whatis data is simply that the current Gopher gateway doesn't support the "whatis" command. This is a minor loss right now, since the database is not automatically maintained, but hopefully someone will improve the gateway once the new release is deployed to allow Gophereens to access all the new databases. If someone wants to join the 3.0 client development team, drop us a line (actually, send the note to Alan Emtage "bajan@bunyip.com", since he's coordinating this work). We're just about to release stuff to the client writers for 3.0. - peterd