home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Boot Disc 8
/
boot-disc-1997-04.iso
/
PDA_Soft
/
Pilot
/
Apps
/
WordList.02
/
readme.txt
< prev
next >
Wrap
Text File
|
1997-02-05
|
2KB
|
45 lines
This program allows you to search a highly compressed list of words.
It can be used for looking up spellings and cheating at various word
games. Currently searches are specified by a substring (with '?' as a
wildcard character) and/or a length.
I have encoded three dictionaries for this program. The raw material
came from ftp sites listed in the rec.puzzles.crosswords FAQ.
dictionary words raw size .pdb size ratio
---------- ------- --------- --------- -----
pocket 21,111 176,662 45,130 3.9
ospd 79,339 614,670 100,561 6.1
unabrd 235,544 2,492,342 516,428 4.8
Compression is performed in two steps. First the beginning of each
word is stripped off and replaced by a count of how many characters to
copy from the beginning of the previous word. Then the file is
compressed using context-sensitive huffman codes (different codes are
used depending on the previous character.)
To evaluate the performance of the context-sensitive huffman codes, I
compressed the prefix-stripped ospd file using several other methods.
program size comments
-------- ------ ---------
ha 78253 PPM (arithmetic coding, variable length context)
bzip 79607 Burroughs-Wheeler
gzip 97409 LZ77 program, same algorithm as pkzip
lzo 132179 LZ77 library, used flags -m972 -b10000
PPM and Burroughs-Wheeler aren't suitable for the Pilot because they
need a lot of working memory. The lzo library by Markus Franz Xaver
Johannes Oberhumer is attractive because it uses no memory and is very
fast, but the compression isn't as good as the context-sensitive
huffman codes that I used. It seems to me that if you are going to
start trading compression away for speed, then you should go all the
way to the data structures described in the article below. Searches
could be hundreds of times faster while the dictionary would be less
than twice as big.
Andrew Appel and Guy Jacobsen, "The World's Fastest Scrabble Program",
Communications of the ACM, May 1988.