NetNews Usenet Archive 1993 #3

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1993 #3 / NN_1993_3.iso / spool / rec / puzzles / 8504 < prev next >

Wrap

Internet Message Format | 1993-01-21 | 2.9 KB

Path: sparky!uunet!munnari.oz.au!bunyip.cc.uq.oz.au!uqcspe!cs.uq.oz.au!grue From: grue@cs.uq.oz.au (Pauli) Newsgroups: rec.puzzles Subject: Re: Computer Dictionary Searches Message-ID: <11812@uqcspe.cs.uq.oz.au> Date: 22 Jan 93 02:58:40 GMT References: <1993Jan20.185756.16017@njitgw.njit.edu> <11811@uqcspe.cs.uq.oz.au> Sender: news@cs.uq.oz.au Reply-To: grue@cs.uq.oz.au Lines: 57 warwick@cs.uq.oz.au (Warwick Allison) writes: >jpk2581@hertz.njit.edu (joseph p kisenwether apmt stnt) writes: >>I am trying to create some original word puzzles and need to find >>words with certain characteristsics: >>For example >> two words of more than ten letters which differ by only one >>stictly internal letter such that there exist two other letter >>(one to the left and one to the right) such that in word one >>there exists no sub-word containing both the changable letter >>and the one to the left while in word two there exists no sub-word >>conatining both the changable letter and the one to the right. Very complex problem. The expensive bit will be the finding of the pairs of ten letter words. The testing if something is a word is very very fast if you follow the scheme Warwick proposes. Even better you can precompute most of this stuff. Keep a table of 10 letter words and the positions where the changable letter satisfies the condition for left and right. My word list contains about 40000 10 letter words, so this isn't too excessive a task. >>c) what the format is so that I can right the appropriate search >> program >A Trie. You could use a DAWG, but we (Paul Dale & I) have found that >a Trie is better, because you can store suplementary data for arcs >in the tree. The dictionary comes in plain text. A DAWG or Trie happens to be a very efficient way of searching for words given some constraints (which can be quite general) or for determining if this word is really a word (complexity is O(length of word) with a small constant multiplier, which should be as good as a hash table). Interestingly, the construction of a Trie from a plain text word list is O(length of word list) and conversion from that to a DAWG is O(n log n) where n is the number of nodes in the Trie. This does assume that you have plenty of space to hold intermediate structures before you pack everything down. I've also implemented a best of both worlds approach: A DAWG containing the bulk of the word list and a secondary Trie that details additions/deletions from the DAWG. This gives the space savings of a DAWG with the editing facilities of a Trie. It doesn't allow data to be tagged with every word like a pure Trie would. Pauli Paul Dale | grue@cs.uq.oz.au Department of Computer Science | +61 7 365 2445 University of Queensland | Australia, 4072 | Did you know that there are 41 two letter | words containing the letter 'a'? --