home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.databases
- From: gtoal@pizzabox.demon.co.uk (Graham Toal)
- Path: sparky!uunet!pipex!demon!pizzabox.demon.co.uk!gtoal
- Subject: free text indexing algorithm refs wanted...
- Distribution: world
- Organization: Cuddlehogs Anonymous
- Lines: 23
- Date: Tue, 25 Aug 1992 19:15:21 +0000
- Message-ID: <714787738snx@pizzabox.demon.co.uk>
- Sender: usenet@gate.demon.co.uk
-
- Having implemented a database using a really cool set of algorithms
- and data structures, which let us do just about anything, we find it's
- depressingly slow compared to some commercial systems. We've have
- worked out what they must be doing by treating them as a black box
- and throwing queries at them, and working out the complexity of the
- search algorithms they must be using. Any pointers to papers or
- articles on different data structures for free text systems?
-
- We're not going to reimplement it now, we're just interested in why
- people made particular decisions; where the tradeoffs are etc. For
- instance, we have hierarchically structred SGML files and can search
- for words between arbitrary start and end tags; some systems have
- fixed numbers of fields and store the presence of a word in that
- field in a bit vector. It's things like that we're interested in.
- Which tradeoffs for speed and size are considered worthwhile by the
- users - eg would they be upset that such-and-such a structure doesn't
- allow sentence or qualified proximity searching?
-
- Replies either here or by email gratefully received.
-
- Many thanks
-
- Graham
-