NetNews Usenet Archive 1992 #19

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #19 / NN_1992_19.iso / spool / comp / database / 6316 < prev next >

Wrap

Text File | 1992-08-25 | 1.5 KB | 35 lines

Newsgroups: comp.databases From: gtoal@pizzabox.demon.co.uk (Graham Toal) Path: sparky!uunet!pipex!demon!pizzabox.demon.co.uk!gtoal Subject: free text indexing algorithm refs wanted... Distribution: world Organization: Cuddlehogs Anonymous Lines: 23 Date: Tue, 25 Aug 1992 19:15:21 +0000 Message-ID: <714787738snx@pizzabox.demon.co.uk> Sender: usenet@gate.demon.co.uk Having implemented a database using a really cool set of algorithms and data structures, which let us do just about anything, we find it's depressingly slow compared to some commercial systems. We've have worked out what they must be doing by treating them as a black box and throwing queries at them, and working out the complexity of the search algorithms they must be using. Any pointers to papers or articles on different data structures for free text systems? We're not going to reimplement it now, we're just interested in why people made particular decisions; where the tradeoffs are etc. For instance, we have hierarchically structred SGML files and can search for words between arbitrary start and end tags; some systems have fixed numbers of fields and store the presence of a word in that field in a bit vector. It's things like that we're interested in. Which tradeoffs for speed and size are considered worthwhile by the users - eg would they be upset that such-and-such a structure doesn't allow sentence or qualified proximity searching? Replies either here or by email gratefully received. Many thanks Graham