home *** CD-ROM | disk | FTP | other *** search
- Building new dictionaries
- ~~~~~~~~~~~~~~~~~~~~~~~~~
- by Jesper Skov
-
-
- Making new dictionaries is not very hard, but you'll have to interpret the
- Makefile yourself.
-
- Below is a complete example where I rebuild the English and Danish
- dictionaries after re-compiling the ispell programs with the MASKBITS
- variable set to 128.
-
-
- [First the fix8bit tool is compiled - you will only need this with certain
- languages, e.g. Danish as we have funky letters :) ]
-
- >cd languages/
- >make fix8bit
- + gcc -O2 -DAMIGA -Iinclude: -o fix8bit fix8bit.c
-
-
- [Then the English dictionary is build. It consists of multiple wordlists so I
- use sort to construct a single wordlist. You may control what sub lists are
- included, thus changing the size and "power" of the dictionary. See the
- Makefile for some pre-defined dictionary sizes.]
-
- >cd english/
- >dir
- -----rw-d 4 1769 Jan 23 1995 altamer.0
- -----rw-d 1 402 Nov 2 1994 altamer.1
- -----rw-d 2 856 Nov 2 1994 altamer.2
- -----rw-d 18 8831 Jan 23 1995 american.0
- -----rw-d 9 4410 Jan 23 1995 american.1
- -----rw-d 80 40591 Jan 23 1995 american.2
- -----rw-d 19 9477 Jan 23 1995 british.0
- -----rw-d 9 4500 Jan 23 1995 british.1
- -----rw-d 81 41194 Jan 23 1995 british.2
- -----rw-d 364 186058 Jan 23 1995 english.0
- -----rw-d 270 137937 Jan 23 1995 english.1
- -----rw-d 618 316348 Jan 23 1995 english.2
- -----rw-d 338 172832 Jan 23 1995 english.3
- -----rw-d 14 6916 Jan 25 1994 english.4l
- -----rw-d 12 5688 Jan 23 1995 english.aff
- -----rw-d 35 17536 Nov 2 1994 Makefile
- ----arwed 27 13670 Jan 23 1995 msgs.h
- Dirs:0 Files:17 Blocks:1901 Bytes:969015
- >bin:sort -u -t/ +0f -1 +0 -o english.med english.0 american.0 altamer.0 british.0 engl
- ish.1 american.1 altamer.1 british.1
- >dir
- -----rw-d 4 1769 Jan 23 1995 altamer.0
- -----rw-d 1 402 Nov 2 1994 altamer.1
- -----rw-d 2 856 Nov 2 1994 altamer.2
- -----rw-d 18 8831 Jan 23 1995 american.0
- -----rw-d 9 4410 Jan 23 1995 american.1
- -----rw-d 80 40591 Jan 23 1995 american.2
- -----rw-d 19 9477 Jan 23 1995 british.0
- -----rw-d 9 4500 Jan 23 1995 british.1
- -----rw-d 81 41194 Jan 23 1995 british.2
- -----rw-d 364 186058 Jan 23 1995 english.0
- -----rw-d 270 137937 Jan 23 1995 english.1
- -----rw-d 618 316348 Jan 23 1995 english.2
- -----rw-d 338 172832 Jan 23 1995 english.3
- -----rw-d 14 6916 Jan 25 1994 english.4l
- -----rw-d 12 5688 Jan 23 1995 english.aff
- -----rwed 688 351911 Sep 14 15:57 english.med
- -----rw-d 35 17536 Nov 2 1994 Makefile
- ----arwed 27 13670 Jan 23 1995 msgs.h
- Dirs:0 Files:18 Blocks:2589 Bytes:1320926
- >buildhash english.med english.aff english.hash
- Counting words in dictionary ...
- 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 11000 12000 13000 14000 15000 16000
- 17000 18000 19000 20000 21000 22000 23000 24000 25000 26000 27000 28000 29000 30000 310
- 00 32000
- 32433 words
- 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 11000 12000 13000 14000 15000 1600
- 0 17000 18000 19000 20000 21000 22000 23000 24000 25000 26000 27000 28000 29000 30000 3
- 1000 32000
- >dir
- -----rw-d 4 1769 Jan 23 1995 altamer.0
- -----rw-d 1 402 Nov 2 1994 altamer.1
- -----rw-d 2 856 Nov 2 1994 altamer.2
- -----rw-d 18 8831 Jan 23 1995 american.0
- -----rw-d 9 4410 Jan 23 1995 american.1
- -----rw-d 80 40591 Jan 23 1995 american.2
- -----rw-d 19 9477 Jan 23 1995 british.0
- -----rw-d 9 4500 Jan 23 1995 british.1
- -----rw-d 81 41194 Jan 23 1995 british.2
- -----rw-d 364 186058 Jan 23 1995 english.0
- -----rwed 1 6 Sep 14 15:58 english.0.cnt
- -----rwed 5 2106 Sep 14 15:58 english.0.stat
- -----rw-d 270 137937 Jan 23 1995 english.1
- -----rw-d 618 316348 Jan 23 1995 english.2
- -----rw-d 338 172832 Jan 23 1995 english.3
- -----rw-d 14 6916 Jan 25 1994 english.4l
- -----rw-d 12 5688 Jan 23 1995 english.aff
- -----rwed 2255 1154482 Sep 21 14:20 english.hash
- -----rwed 688 351911 Sep 14 15:57 english.med
- -----rwed 1 6 Sep 21 14:20 english.med.cnt
- -----rwed 5 2107 Sep 21 14:20 english.med.stat
- -----rw-d 35 17536 Nov 2 1994 Makefile
- ----arwed 27 13670 Jan 23 1995 msgs.h
- Dirs:0 Files:23 Blocks:4856 Bytes:2479633
- >copy english.aff english.hash english.med english.med.cnt english.med.stat
- \ispell:lib
- >cd /
-
- [Now rebuild the Danish dictionary. There is only one word list so sort is
- not used. The fix8bit tool is used to 8-bit correct the affix file.
- BTW: the word list is found at one of the suggested sites in
- languages/Where. It is not part of the Ispell distribution.]
-
- >cd dansk/
- >dir
- -----rw-d 11 5464 Jan 23 1995 dansk.7bit
- -----rw-d 632 323386 Jun 29 19:53 dansk.med
- -----rw-d 9 4594 Nov 2 1994 Makefile
- Dirs:0 Files:4 Blocks:663 Bytes:338758
- >../fix8bit -8 < dansk.7bit > dansk.aff
- >dh3:ispell-3.1.18Work/buildhash dansk.med dansk.aff dansk.hash
- Counting words in dictionary ...
- 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 11000 12000 13000 14000 15000 16000
- 17000 18000 19000 20000 21000 22000 23000 24000 25000 26000 27000
- 27606 words
- 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 11000 12000 13000 14000 15000 1600
- 0 17000 18000 19000 20000 21000 22000 23000 24000 25000 26000 27000
- >dir
- -----rw-d 11 5464 Jan 23 1995 dansk.7bit
- -----rwed 11 5314 Sep 21 14:40 dansk.aff
- -----rwed 2091 1070528 Sep 21 14:41 dansk.hash
- -----rw-d 632 323386 Jun 29 19:53 dansk.med
- -----rwed 1 6 Sep 21 14:41 dansk.med.cnt
- -----rwed 5 2106 Sep 21 14:41 dansk.med.stat
- -----rw-d 9 4594 Nov 2 1994 Makefile
- Dirs:0 Files:7 Blocks:2760 Bytes:1411398
- >copy dansk.aff dansk.hash dansk.med dansk.med.cnt dansk.med.stat ispell:lib
- dansk.aff..copied.
- dansk.hash..copied.
- dansk.med..copied.
- dansk.med.cnt..copied.
- dansk.med.stat..copied.
- >
-
- That's it. I hope this little document will make it easier for you to build
- dictionaries. If there are any "bugs" in this doc, please inform me thereof!
-
- /Jesper
-