OS/2 Shareware BBS: Product

home *** CD-ROM | disk | FTP | other *** search

/ OS/2 Shareware BBS: Product / Product.zip / ISPSRC.ZIP / tree.c < prev next >

Wrap

C/C++ Source or Header | 1992-09-10 | 23.9 KB | 855 lines

/* -*- Mode:Text -*- */ #ifndef lint static char Rcs_Id[] = "$Id: tree.c,v 1.44 91/08/10 14:09:49 geoff Exp $"; #endif /* * tree.c - a hash style dictionary for user's personal words * * Pace Willisson, 1983 * Hash support added by Geoff Kuenning, 1987 * * Copyright 1987, 1988, 1989, by Geoff Kuenning, Manhattan Beach, CA * Permission for non-profit use is hereby granted. * All other rights reserved. * See "version.h" for a more complete copyright notice. */ /* * $Log: tree.c,v $ * Revision 1.44 91/08/10 14:09:49 geoff * Don't issue TREE_C_CANT_UPDATE if the personal dictionary doesn't * exist at all. Sleep for 2 seconds after issuing the message, so that * the user has a chance to see it before ispell clears the screen. * * Revision 1.43 91/07/27 20:48:39 geoff * Major rewrite of the personal-dictionary initialization to support a * personal dictionary in the local directory, and to merge it with the * one from the home directory if possible. Fix a compile error and a * bug in the REGEX_LOOKUP code. * * Revision 1.42 91/07/15 19:27:14 geoff * Make tinsert static. Mention that the argument to treeinsert must be * canonical. Provide the "canonical" parameter to all strtoichar, * strtosichar, and casecmp calls. Make sure the argument to casecmp is * canonical. * * Revision 1.41 91/07/11 19:52:29 geoff * Remove the include of stdio.h, since ispell.h now does this. * * Revision 1.40 91/07/05 20:32:00 geoff * Fix some more lint complaints. * * Revision 1.39 91/07/03 18:21:14 geoff * Don't include types.h, dir.h, or param.h, since config.h now does that. * * Revision 1.38 91/06/23 22:01:04 geoff * On non-USG systems, include sys/dir.h for MAXNAMLEN * * Revision 1.37 91/01/24 02:28:12 geoff * Modify dictionary-finding code to be more consistent and predictable * about where it looks, to create nonexistent dictionaries in $HOME, and * to fix a bug that caused private dictionaries in the home to be clobbered. * * Revision 1.36 90/12/31 00:59:34 geoff * Reformat to follow a consistent convention throughout ispell * * Revision 1.35 90/04/26 22:44:31 geoff * Add the canonicalize parameter to the call to ichartostr. * * Revision 1.34 89/12/27 03:19:02 geoff * Move all messages to msgs.h so they can be reconfigured * * Revision 1.33 89/07/11 00:25:41 geoff * Add REGEX_LOOKUP support and Amiga support, based on that from * luis@rice.edu. Also change the name of the personal hash table to better * distinguish it from the main one. * * Revision 1.32 89/06/09 15:56:53 geoff * Add support for the internal "character" type, ichar_t. * * Revision 1.31 89/04/28 01:17:36 geoff * Change Header to Id; nobody cares about my pathnames. * * Revision 1.30 89/04/03 01:59:15 geoff * Fix a bunch of lint complaints. * * Revision 1.29 88/12/26 02:33:53 geoff * Add a copyright notice. * * Revision 1.28 88/10/20 20:53:27 geoff * Add support for getting a different personal dictionary when the -d * switch is specified, for multilingual use of ispell. * * Revision 1.27 88/04/30 22:16:25 geoff * Fix some lint complaints. * * Revision 1.26 88/03/27 01:06:03 geoff * Fix a whole bunch of problems with the CAPITALIZATION option. * * Revision 1.25 88/03/12 02:46:03 geoff * Check the return status from makedent, and don't idiotically try to * continue if it fails. * * Revision 1.24 88/02/20 23:15:06 geoff * Major changes to support the new capitalization handling, and to use * the new subroutines in the "makedent.c" module. * * Revision 1.23 87/09/30 23:32:04 geoff * Move some globals to ispell.h. * * Revision 1.22 87/09/26 15:53:07 geoff * Make buffer sizes be more consistent. * * Revision 1.21 87/09/24 23:24:35 geoff * When changing a lowercase word to followcase, remember to set the * capitalize flags. (Bart Schaefer). * * Revision 1.20 87/09/09 00:20:10 geoff * Fix treeoutput() to be nondestructive (Doug Lind, Michael Wester). It's * faster too, since it now uses qsort. Also fix several bugs in * treeinsert: capitalization information wasn't checked on memory * overflow, and duplicate followcase entries weren't detected. * * Revision 1.19 87/09/03 19:26:59 geoff * Simplify the hash-size table now that we don't read entire dictionaries * into the personal-hash tree. * * Revision 1.18 87/07/20 23:23:27 geoff * Modify to support the new flag format used with language tables. * * Revision 1.17 87/06/07 15:04:25 geoff * Don't refer to capitalization flags if CAPITALIZE is off (Gary Johnson) * * Revision 1.16 87/05/27 23:28:40 geoff * Reset hasslash in toutword so every word gets a slash * * Revision 1.15 87/05/11 11:50:35 geoff * Speed up SORTPERSONAL dramatically by not scanning unkept words repeatedly. * * Revision 1.14 87/04/19 22:53:55 geoff * Add SORTPERSONAL and capitalization support. * * Revision 1.13 87/04/02 12:25:55 geoff * Remove the obsolete treeprint() routine. Put the "can't update * personal dict" error message onto stderr, not stdout. * * Revision 1.12 87/04/01 15:23:06 geoff * Integrate Joe Orost's V7/register changes into the main branch * * Revision 1.11 87/03/28 19:33:24 geoff * Put the personal dictionary out in compressed (no extra slashes) form * * Revision 1.10 87/03/27 17:21:48 geoff * Accept (but don't require) new-format dictionaries without extra slashes. * However, don't generate them on dumps yet. * * Revision 1.9 87/03/22 23:28:03 geoff * Integrate Perry Smith's changes into the main branch * * Revision 1.8 87/03/12 23:36:54 geoff * Fix treeprint, which didn't catch up to the latest changes. Remove the * list-link following in the main hash table in treeoutput, since the * entries will eventually be found by the linear search. * * Revision 1.7 87/03/10 23:33:13 geoff * Add code to deal gracefully with memory allocation failures. If small * allocations fail, put out a message and exit. If a big hash table allocation * fails, put out a message, but continue anyway, just filling the existing * table overfull. * * Revision 1.6 87/03/08 20:31:27 geoff * Improve the hash sizes table to be appropriate to really big dictionaries. * Change the personal-dictionary opening code to be more flexible about * where it looks, so that relative pathnames will work sensibly. * Make lots of changes in the hashing so it works better and faster. * * Revision 1.5 87/03/01 00:57:17 geoff * Major changes to use a hash table instead of a binary tree, allow * user dictionaries to have suffixes, and integrate the user dictionary * with the main one. * * Revision 1.4 87/02/28 14:58:44 geoff * Modify to support suffix flags in the user dictionary. * * Revision 1.3 87/02/26 00:25:23 geoff * Integrate McQueer's enhancements into the main branch * * Revision 1.2 87/01/17 13:12:28 geoff * Add RCS ID keywords * */ #include <ctype.h> #include <errno.h> #include "config.h" #include "ispell.h" #include "msgs.h" static int cantexpand = 0; /* NZ if an expansion fails */ static struct dent * pershtab; /* Aux hash table for personal dict */ static int pershsize = 0; /* Space available in aux hash table */ static int hcount = 0; /* Number of items in hash table */ /* * Hash table sizes. Prime is probably a good idea, though in truth I * whipped the algorithm up on the spot rather than looking it up, so * who knows what's really best? If we overflow the table, we just * use a double-and-add-1 algorithm. */ static int goodsizes[] = { 53, 223, 907, 3631 }; static void treeload (); void treeinsert (); static struct dent * tinsert (); struct dent * treelookup (); static char personaldict[MAXPATHLEN]; static FILE * dictf; static newwords = 0; extern struct dent * lookup(); extern void upcase(); void myfree (); extern char * index (); extern char * calloc (); extern char * malloc (); extern char * realloc (); extern void free (); extern void exit (); extern void perror (); extern char * getenv(); extern char * strcpy (); extern void qsort (); treeinit (p, LibDict) char * p; /* Value specified in -p switch */ char * LibDict; /* Root of default dict name */ { int abspath; /* NZ if p is abs path name */ char * h; /* Home directory name */ char seconddict[MAXPATHLEN]; /* Name of secondary dict */ /* ** If -p was not specified, try to get a default name from the ** configuration file settings, and next from the ** environment. After this point, if p is null, the the value in ** personaldict is the only possible name for the personal dictionary. ** If p is non-null, then there is a possibility that we should ** prepend HOME to get the correct dictionary name. */ #ifdef OS2 /* try to get config file info 8/24/92 */ if ( (p == NULL) && (cfpersonaldict[0] != NULL) ) p = cfpersonaldict; #endif /* OS2 */ if (p == NULL) p = getenv (PDICTVAR); /* ** if p exists and begins with '/' we don't really need HOME, ** but it's not very likely that HOME isn't set anyway. ** The HOME environment variable will not be sought under os/2. ** jbh 8/26/92 */ #ifdef OS2 h=cflibdir; /* For os/2 home is going to be where ispell.exe is located */ #else if ((h = getenv ("HOME")) == NULL) #ifdef AMIGA h = LIBDIR; #else /* AMIGA */ return; #endif /* AMIGA */ #endif /* OS2 */ if (p == NULL) { /* * No -p and no PDICTVAR. We will use LibDict and DEFPAFF to * figure out the name of the personal dictionary and where it * is. The rules are as follows: * * (1) If there is a dictionary in the local directory which * is named after the hash file, this will become * "personaldict", which is where any changes will be saved. * (2) If there is also a dictionary in $HOME, we will load * it, regardless of whether the dictionary listed in (1) * exists. If step (1) failed, this will also become * "personaldict". * (3) If both previous steps fail, we will try them again, * using DEFPAFF as the suffix. */ (void) sprintf (personaldict, "%s%s", DEFPDICT, LibDict); (void) sprintf (seconddict, "%s/%s%s", h, DEFPDICT, LibDict); if ((dictf = fopen (personaldict, "r")) == NULL) personaldict[0] = '\0'; else { treeload (dictf); (void) fclose (dictf); } if ((dictf = fopen (seconddict, "r")) == NULL) seconddict[0] = '\0'; else { treeload (dictf); (void) fclose (dictf); } if (personaldict[0] == '\0') { if (seconddict[0] != '\0') (void) strcpy (personaldict, seconddict); else { (void) sprintf (personaldict, "%s%s", DEFPDICT, DEFPAFF); (void) sprintf (seconddict, "%s/%s%s", h, DEFPDICT, DEFPAFF); if ((dictf = fopen (personaldict, "r")) == NULL) (void) strcpy (personaldict, seconddict); else { treeload (dictf); (void) fclose (dictf); } if ((dictf = fopen (seconddict, "r")) != NULL) { treeload (dictf); (void) fclose (dictf); } } } } else { /* ** Figure out if p is an absolute path name. Note that beginning ** with "./" and "../" is considered an absolute path, since this ** still means we can't prepend HOME. ** ** added by jbh 8/15/92: Must also look for drive specifiers ** as an indicator of absolute path under os/2. The filename is ** searched for the ":" character, since this indicates an ** absolute path */ #ifdef OS2 abspath = (*p == '/' || strncmp (p, "./", 2) == 0 || strncmp (p, "../", 3) == 0 || strchr (p , ':') != NULL ); #else abspath = (*p == '/' || strncmp (p, "./", 2) == 0 || strncmp (p, "../", 3) == 0); #endif /* OS2 */ if (abspath) { (void) strcpy (personaldict, p); if ((dictf = fopen (personaldict, "r")) != NULL) { treeload (dictf); (void) fclose (dictf); } } else { /* ** The user gave us a relative pathname. We will try it ** locally, and if that doesn't work, we'll try the home ** directory. If neither exists, it will be created in ** the home directory if words are added. */ (void) strcpy (personaldict, p); if ((dictf = fopen (personaldict, "r")) != NULL) { treeload (dictf); (void) fclose (dictf); } else if (!abspath) { /* Try the home */ (void) sprintf (personaldict, "%s/%s", h, p); if ((dictf = fopen (personaldict, "r")) != NULL) { treeload (dictf); (void) fclose (dictf); } } /* * If dictf is null, we couldn't open the dictionary * specified in the -p switch. Complain. */ if (dictf == NULL) { (void) fprintf (stderr, CANT_OPEN, p); perror (""); return; } } } if (!lflag && !aflag && access (personaldict, 2) < 0 && errno != ENOENT) { (void) fprintf (stderr, TREE_C_CANT_UPDATE, personaldict); (void) sleep (2); } } static void treeload (dictf) register FILE * dictf; /* File to load words from */ { char buf[BUFSIZ]; /* Buffer for reading pers dict */ while (fgets (buf, sizeof buf, dictf) != NULL) treeinsert (buf, 1); newwords = 0; } void treeinsert (word, keep) char * word; /* Word to insert - must be canonical */ int keep; { register int i; struct dent wordent; register struct dent * dp; struct dent * olddp; #ifdef CAPITALIZATION struct dent * newdp; #endif struct dent * oldhtab; int oldhsize; ichar_t nword[INPUTWORDLEN + MAXAFFIXLEN]; #ifdef CAPITALIZATION int isvariant; #endif /* * Expand hash table when it is MAXPCT % full. */ if (!cantexpand && (hcount * 100) / MAXPCT >= pershsize) { oldhsize = pershsize; oldhtab = pershtab; for (i = 0; i < sizeof goodsizes / sizeof (goodsizes[0]); i++) { if (goodsizes[i] > pershsize) break; } if (i >= sizeof goodsizes / sizeof goodsizes[0]) pershsize += pershsize + 1; else pershsize = goodsizes[i]; pershtab = (struct dent *) calloc ((unsigned) pershsize, sizeof (struct dent)); if (pershtab == NULL) { (void) fprintf (stderr, TREE_C_NO_SPACE); /* * Try to continue anyway, since our overflow * algorithm can handle an overfull (100%+) table, * and the malloc very likely failed because we * already have such a huge table, so small mallocs * for overflow entries will still work. */ if (oldhtab == NULL) exit (1); /* No old table, can't go on */ (void) fprintf (stderr, TREE_C_TRY_ANYWAY); cantexpand = 1; /* Suppress further messages */ pershsize = oldhsize; /* Put things back */ pershtab = oldhtab; /* ... */ newwords = 1; /* And pretend it worked */ } else { /* * Re-insert old entries into new table */ for (i = 0; i < oldhsize; i++) { dp = &oldhtab[i]; if (dp->flagfield & USED) { #ifdef CAPITALIZATION newdp = tinsert (dp); isvariant = (dp->flagfield & MOREVARIANTS); #else (void) tinsert (dp); #endif dp = dp->next; #ifdef CAPITALIZATION while (dp != NULL) { if (isvariant) { isvariant = dp->flagfield & MOREVARIANTS; olddp = newdp->next; newdp->next = dp; newdp = dp; dp = dp->next; newdp->next = olddp; } else { isvariant = dp->flagfield & MOREVARIANTS; newdp = tinsert (dp); olddp = dp; dp = dp->next; free ((char *) olddp); } } #else while (dp != NULL) { (void) tinsert (dp); olddp = dp; dp = dp->next; free ((char *) olddp); } #endif } } if (oldhtab != NULL) free ((char *) oldhtab); } } /* ** We're ready to do the insertion. Start by creating a sample ** entry for the word. */ if (makedent (word, &wordent) < 0) return; /* Word must be too big or something */ if (keep) wordent.flagfield |= KEEP; /* ** Now see if word or a variant is already in the table. We use the ** capitalized version so we'll find the header, if any. **/ strtoichar (nword, word, 1); upcase (nword); if ((dp = lookup (nword, 1)) != NULL) { /* It exists. Combine caps and set the keep flag. */ if (combinecaps (dp, &wordent) < 0) { free (wordent.word); return; } } else { /* It's new. Insert the word. */ dp = tinsert (&wordent); #ifdef CAPITALIZATION if (captype (dp->flagfield) == FOLLOWCASE) (void) addvheader (dp); #endif } newwords |= keep; } static struct dent * tinsert (proto) struct dent * proto; /* Prototype entry to copy */ { ichar_t iword[INPUTWORDLEN + MAXAFFIXLEN]; register int hcode; register struct dent * hp; /* Next trial entry in hash table */ register struct dent * php; /* Prev. value of hp, for chaining */ strtoichar (iword, proto->word, 1); #ifndef CAPITALIZATION upcase (iword); #endif hcode = hash (iword, pershsize); php = NULL; hp = &pershtab[hcode]; if (hp->flagfield & USED) { while (hp != NULL) { php = hp; hp = hp->next; } hp = (struct dent *) calloc (1, sizeof (struct dent)); if (hp == NULL) { (void) fprintf (stderr, TREE_C_NO_SPACE); exit (1); } } *hp = *proto; if (php != NULL) php->next = hp; hp->next = NULL; return hp; } struct dent * treelookup (word) register ichar_t * word; { register int hcode; register struct dent * hp; char chword[INPUTWORDLEN + MAXAFFIXLEN]; if (pershsize <= 0) return NULL; (void) ichartostr (chword, word, 1); hcode = hash (word, pershsize); hp = &pershtab[hcode]; while (hp != NULL && (hp->flagfield & USED)) { if (strcmp (chword, hp->word) == 0) break; #ifdef CAPITALIZATION while (hp->flagfield & MOREVARIANTS) hp = hp->next; #endif hp = hp->next; } if (hp != NULL && (hp->flagfield & USED)) return hp; else return NULL; } #if SORTPERSONAL != 0 /* Comparison routine for sorting the personal dictionary with qsort */ pdictcmp (enta, entb) struct dent ** enta; struct dent ** entb; { /* The parentheses around *enta/*entb below are NECESSARY! ** Otherwise the compiler reads it as *(enta->word), or ** enta->word[0], which is illegal (but pcc takes it and ** produces wrong code). **/ return casecmp ((*enta)->word, (*entb)->word, 1); } #endif treeoutput () { register struct dent * cent; /* Current entry */ register struct dent * lent; /* Linked entry */ #if SORTPERSONAL != 0 int pdictsize; /* Number of entries to write */ struct dent ** sortlist; /* List of entries to be sorted */ register struct dent ** sortptr; /* Handy pointer into sortlist */ #endif register struct dent * ehtab; /* End of pershtab, for fast looping */ if (newwords == 0) return; if ((dictf = fopen (personaldict, "w")) == NULL) { (void) fprintf (stderr, CANT_CREATE, personaldict); return; } #if SORTPERSONAL != 0 /* ** If we are going to sort the personal dictionary, we must know ** how many items are going to be sorted. */ if (hcount >= SORTPERSONAL) sortlist = NULL; else { pdictsize = 0; for (cent = pershtab, ehtab = pershtab + pershsize; cent < ehtab; cent++) { for (lent = cent; lent != NULL; lent = lent->next) { if ((lent->flagfield & (USED | KEEP)) == (USED | KEEP)) pdictsize++; #ifdef CAPITALIZATION while (lent->flagfield & MOREVARIANTS) lent = lent->next; #endif } } for (cent = hashtbl, ehtab = hashtbl + hashsize; cent < ehtab; cent++) { if ((cent->flagfield & (USED | KEEP)) == (USED | KEEP)) { /* ** We only want to count variant headers ** and standalone entries. These happen ** to share the characteristics in the ** test below. This test will appear ** several more times in this routine. */ #ifdef CAPITALIZATION if (captype (cent->flagfield) != FOLLOWCASE && cent->word != NULL) #endif pdictsize++; } } sortlist = (struct dent **) malloc (pdictsize * sizeof (struct dent)); } if (sortlist == NULL) { #endif for (cent = pershtab, ehtab = pershtab + pershsize; cent < ehtab; cent++) { for (lent = cent; lent != NULL; lent = lent->next) { if ((lent->flagfield & (USED | KEEP)) == (USED | KEEP)) { toutent (dictf, lent, 1); #ifdef CAPITALIZATION while (lent->flagfield & MOREVARIANTS) lent = lent->next; #endif } } } for (cent = hashtbl, ehtab = hashtbl + hashsize; cent < ehtab; cent++) { if ((cent->flagfield & (USED | KEEP)) == (USED | KEEP)) { #ifdef CAPITALIZATION if (captype (cent->flagfield) != FOLLOWCASE && cent->word != NULL) #endif toutent (dictf, cent, 1); } } #if SORTPERSONAL != 0 return; } /* ** Produce dictionary in sorted order. We used to do this ** destructively, but that turns out to fail because in some modes ** the dictionary is written more than once. So we build an ** auxiliary pointer table (in sortlist) and sort that. This ** is faster anyway, though it uses more memory. */ sortptr = sortlist; for (cent = pershtab, ehtab = pershtab + pershsize; cent < ehtab; cent++) { for (lent = cent; lent != NULL; lent = lent->next) { if ((lent->flagfield & (USED | KEEP)) == (USED | KEEP)) { *sortptr++ = lent; #ifdef CAPITALIZATION while (lent->flagfield & MOREVARIANTS) lent = lent->next; #endif } } } for (cent = hashtbl, ehtab = hashtbl + hashsize; cent < ehtab; cent++) { if ((cent->flagfield & (USED | KEEP)) == (USED | KEEP)) { #ifdef CAPITALIZATION if (captype (cent->flagfield) != FOLLOWCASE && cent->word != NULL) #endif *sortptr++ = cent; } } /* Sort the list */ qsort ((char *) sortlist, (unsigned) pdictsize, sizeof (sortlist[0]), pdictcmp); /* Write it out */ for (sortptr = sortlist; --pdictsize >= 0; ) toutent (dictf, *sortptr++, 1); free ((char *) sortlist); #endif newwords = 0; (void) fclose (dictf); } char * mymalloc (size) { return malloc ((unsigned) size); } void myfree (ptr) char * ptr; { if (hashstrings != NULL && ptr >= hashstrings && ptr <= hashstrings + hashheader.stringsize) return; /* Can't free stuff in hashstrings */ free (ptr); } #ifdef REGEX_LOOKUP /* check the hashed dictionary for words matching the regex. return the */ /* a matching string if found else return NULL */ char * do_regex_lookup (expr, whence) char * expr; /* regular expression to use in the match */ int whence; /* 0 = start at the beg with new regx, else */ /* continue from cur point w/ old regex */ { static struct dent * curent; static int curindex; static struct dent * curpersent; static int curpersindex; static char * cmp_expr; char dummy[INPUTWORDLEN + MAXAFFIXLEN]; ichar_t * is; if (whence == 0) { is = strtosichar (expr, 0); upcase (is); expr = ichartosstr (is, 1); cmp_expr = REGCMP (expr); curent = hashtbl; curindex = 0; curpersent = pershtab; curpersindex = 0; } /* search the dictionary until the word is found or the words run out */ for ( ; curindex < hashsize; curent++, curindex++) { if (curent->word != NULL && REGEX (cmp_expr, curent->word, dummy) != NULL) { curindex++; /* Everybody's gotta write a wierd expression once in a while! */ return curent++->word; } } /* Try the personal dictionary too */ for ( ; curpersindex < pershsize; curpersent++, curpersindex++) { if ((curpersent->flagfield & USED) != 0 & curpersent->word != NULL && REGEX (cmp_expr, curpersent->word, dummy) != NULL) { curpersindex++; /* Everybody's gotta write a wierd expression once in a while! */ return curpersent++->word; } } return NULL; } #endif /* REGEX_LOOKUP */