This program allows you to search a highly compressed list of words. It can be used for looking up spellings and cheating at various word games. Currently searches are specified by a substring (with '?' as a wildcard character) and/or a length. I have encoded three dictionaries for this program. The raw material came from ftp sites listed in the rec.puzzles.crosswords FAQ. dictionary words raw size .pdb size ratio ---------- ------- --------- --------- ----- pocket 21,111 176,662 45,130 3.9 ospd 79,339 614,670 100,561 6.1 unabrd 235,544 2,492,342 516,428 4.8 Compression is performed in two steps. First the beginning of each word is stripped off and replaced by a count of how many characters to copy from the beginning of the previous word. Then the file is compressed using context-sensitive huffman codes (different codes are used depending on the previous character.) To evaluate the performance of the context-sensitive huffman codes, I compressed the prefix-stripped ospd file using several other methods. program size comments -------- ------ --------- ha 78253 PPM (arithmetic coding, variable length context) bzip 79607 Burroughs-Wheeler gzip 97409 LZ77 program, same algorithm as pkzip lzo 132179 LZ77 library, used flags -m972 -b10000 PPM and Burroughs-Wheeler aren't suitable for the Pilot because they need a lot of working memory. The lzo library by Markus Franz Xaver Johannes Oberhumer is attractive because it uses no memory and is very fast, but the compression isn't as good as the context-sensitive huffman codes that I used. It seems to me that if you are going to start trading compression away for speed, then you should go all the way to the data structures described in the article below. Searches could be hundreds of times faster while the dictionary would be less than twice as big. Andrew Appel and Guy Jacobsen, "The World's Fastest Scrabble Program", Communications of the ACM, May 1988.