home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!munnari.oz.au!bunyip.cc.uq.oz.au!uqcspe!cs.uq.oz.au!grue
- From: grue@cs.uq.oz.au (Pauli)
- Newsgroups: rec.puzzles
- Subject: Re: Computer Dictionary Searches
- Message-ID: <11812@uqcspe.cs.uq.oz.au>
- Date: 22 Jan 93 02:58:40 GMT
- References: <1993Jan20.185756.16017@njitgw.njit.edu> <11811@uqcspe.cs.uq.oz.au>
- Sender: news@cs.uq.oz.au
- Reply-To: grue@cs.uq.oz.au
- Lines: 57
-
- warwick@cs.uq.oz.au (Warwick Allison) writes:
-
- >jpk2581@hertz.njit.edu (joseph p kisenwether apmt stnt) writes:
-
- >>I am trying to create some original word puzzles and need to find
- >>words with certain characteristsics:
- >>For example
- >> two words of more than ten letters which differ by only one
- >>stictly internal letter such that there exist two other letter
- >>(one to the left and one to the right) such that in word one
- >>there exists no sub-word containing both the changable letter
- >>and the one to the left while in word two there exists no sub-word
- >>conatining both the changable letter and the one to the right.
-
- Very complex problem. The expensive bit will be the finding of the pairs
- of ten letter words. The testing if something is a word is very very fast
- if you follow the scheme Warwick proposes. Even better you can precompute
- most of this stuff. Keep a table of 10 letter words and the positions where
- the changable letter satisfies the condition for left and right. My word
- list contains about 40000 10 letter words, so this isn't too excessive a task.
-
-
- >>c) what the format is so that I can right the appropriate search
- >> program
-
- >A Trie. You could use a DAWG, but we (Paul Dale & I) have found that
- >a Trie is better, because you can store suplementary data for arcs
- >in the tree.
-
- The dictionary comes in plain text. A DAWG or Trie happens to be a very
- efficient way of searching for words given some constraints (which can
- be quite general) or for determining if this word is really a word (complexity
- is O(length of word) with a small constant multiplier, which should be as good
- as a hash table).
-
- Interestingly, the construction of a Trie from a plain text word list is
- O(length of word list) and conversion from that to a DAWG is O(n log n) where
- n is the number of nodes in the Trie. This does assume that you have plenty
- of space to hold intermediate structures before you pack everything down.
-
- I've also implemented a best of both worlds approach: A DAWG containing the
- bulk of the word list and a secondary Trie that details additions/deletions
- from the DAWG. This gives the space savings of a DAWG with the editing
- facilities of a Trie. It doesn't allow data to be tagged with every word
- like a pure Trie would.
-
-
-
- Pauli
-
- Paul Dale | grue@cs.uq.oz.au
- Department of Computer Science | +61 7 365 2445
- University of Queensland |
- Australia, 4072 | Did you know that there are 41 two letter
- | words containing the letter 'a'?
-
- --
-