This program allows you to search a highly compressed list of words.
It can be used for looking up spellings and cheating at various word
games.  Currently searches are specified by a substring (with '?' as a
wildcard character) and/or a length.

I have encoded three dictionaries for this program.  The raw material
came from ftp sites listed in the rec.puzzles.crosswords FAQ.

dictionary    words    raw size  .pdb size  ratio
----------   -------  ---------  ---------  -----
 pocket       21,111    176,662    45,130    3.9
  ospd        79,339    614,670   100,561    6.1
 unabrd      235,544  2,492,342   516,428    4.8

Compression is performed in two steps.  First the beginning of each
word is stripped off and replaced by a count of how many characters to
copy from the beginning of the previous word.  Then the file is
compressed using context-sensitive huffman codes (different codes are
used depending on the previous character.)

To evaluate the performance of the context-sensitive huffman codes, I
compressed the prefix-stripped ospd file using several other methods.

 program   size     comments
--------  ------   ---------
  ha      78253    PPM (arithmetic coding, variable length context)
 bzip     79607    Burroughs-Wheeler
 gzip     97409    LZ77 program, same algorithm as pkzip
 lzo     132179    LZ77 library, used flags -m972 -b10000


PPM and Burroughs-Wheeler aren't suitable for the Pilot because they
need a lot of working memory. The lzo library by Markus Franz Xaver
Johannes Oberhumer is attractive because it uses no memory and is very
fast, but the compression isn't as good as the context-sensitive
huffman codes that I used.  It seems to me that if you are going to
start trading compression away for speed, then you should go all the
way to the data structures described in the article below.  Searches
could be hundreds of times faster while the dictionary would be less
than twice as big.

Andrew Appel and Guy Jacobsen, "The World's Fastest Scrabble Program",
Communications of the ACM, May 1988.