home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!zephyr.ens.tek.com!shaman!endeavor18!alanj
- From: alanj@endeavor18.tek.com (Alan Jeddeloh)
- Newsgroups: comp.programming
- Subject: Re: Soundex algorithms, database indexing
- Message-ID: <1453@shaman.wv.tek.com>
- Date: 22 Jul 92 17:01:27 GMT
- Sender: nobody@shaman.wv.tek.com
- Reply-To: Alan.Jeddeloh@tek.com
- Distribution: usa
- Organization: Tektronix, Inc.
- Lines: 172
-
- In article <1992Jul22.122831.27758@cbnews.cb.att.com> you write:
- > I would like to get info/ a 'C' program (function) that makes use of
- > the Soundex algorithm.
-
- This has worked for me for generating Soundex code for searching Federal Census
- indexes. I tried to reply by mail, but it bounced of of att:
- >From postmaster Wed Jul 22 12:53 EDT 1992
- >Mail to `marc' alias `ssn=' from `att!endeavor18.wv.tek.com!alanj' failed.
- >The command `exec post -x -o '%^24name %20ema %^city, %+state' -- ssn=' returned error status 100.
- >The error message was:
- >post: ssn=: Ambiguous
- >0 Jung-Kuei_Chen Murray Hill, NJ
- >1 Gernot_Kubin Murray Hill, NJ
- >2 David_Mansour Murray Hill, NJ
- >3 Jacques_Terken Murray Hill, NJ
-
- ------------- CUT HERE ---------------
-
- #include <stdio.h>
- #include <strings.h>
- #include <ctype.h>
-
- /*
- *
- * This is a summary of the soundex codes, and interpretations, should you run
- * across a Soundex Census. These REALLY are helpful (at least in my
- * experience).
- *
- * SOUNDEX CODE
- * ------------
- *
- * Code Key Letters & Equivalent
- * ---- ------------------------
- *
- * 1 b, p, f, v
- * 2 c, s, k, g, j, q, x, z
- * 3 d, t
- * 4 l
- * 5 m, n
- * 6 r
- *
- *
- * Rules:
- * ------
- *
- * Letters a, e, i, o, u, y, w, h are NOT coded
- * - - - - - - - - ---
- *
- * First letter of the SURNAME is NOT encoded
- * ----- ------- ---
- *
- * All Soundex codes must contain 3 digits. Shorter numbers, append 0's.
- * --- ---- -
- *
- * When two key letters or equivalent appear together, (preceeding or following)
- *** IN THE ORIGINAL NAME ***
- * they are counted as one. (ie, KELLER = K-460 (two "L's" together combine)
- * (also, MENNON = M-500 N,N, and N all counted together, (remove all vowels))
- *
- *
- * When searching through records that contain the same Soundex code, names are
- * arranged alphabetically. (Prefixes COULD be dropped, ie. van, Von, Di, de,
- * le, D, etc).
- *
- * Soundex codes are not limited by county/city locality, but rather by state.
- * This enables you to search an entire state for similar sounding names.
- *
- * 1880 Soundex Census will list households IF they had children under 10 yrs
- * old. __
- *
- * 1900 Soundex Census is said to list ALL homes, regardless of children, etc.
- *
- *
- * Here are some examples to get an idea of how this code works.
- *
- * NAME CODE WHY
- * ---- ---- ----------------------
- *
- * Masters M-236 M-str (truncate rest)
- * Anderson A-536 A-ndr (truncate rest)
- * Pratt P-630 P-rt (double letters, add 0)
- * McGee M-200 M-c (g-same group. Not counted, add 0's)
- * Lee L-000 L- (vowels not counted, add 0's)
- *
- */
-
- int letter_values[26] = {
- 0, 1, 2, 3, 0, 1, 2, 0, /* abcdefgh */
- 0, 2, 2, 4, 5, 5, 0, 1, /* ijklmnop */
- 2, 6, 2, 3, 0, 1, 0, 2, /* qrstuvwx */
- 0, 2 /* yz */
- };
-
-
- char name[256];
- char buffer[256];
-
- char alphas[] = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
- char delims[] = "\t\b\r\n ~`!@#$%^&*()_+{}:\"|,.?-=[];'\\,./";
-
-
- main (argc, argv)
- int argc;
- char *argv[];
- {
- char *p1;
- char *p2;
-
- if (argc == 2) {
- p1 = strpbrk (argv[1], alphas);
- p2 = strpbrk (p1, delims);
- if (p2)
- *p2 = '\0';
- soundex (p1);
- } else {
- while (fgets (buffer, sizeof (buffer), stdin) != NULL) {
- p1 = strpbrk (buffer, alphas);
- p2 = strpbrk (p1, delims);
- if (p2)
- *p2 = '\0';
- soundex (p1);
- }
- }
- }
-
-
-
-
- soundex (nme)
- char *nme;
- {
- int length;
- int i;
- int last;
- int count;
-
- printf ("%s:\t", nme);
-
- length = strlen (nme);
- strcpy (name, nme);
-
- putchar (toupper (name[0]));
- putchar ('-');
-
- for (i = 0; i < length; i++) {
- name[i] = toupper (name[i]);
- if ((name[i] >= 'A') && (name[i] <= 'Z')) {
- name[i] = letter_values[name[i] - 'A'];
- }
- }
-
- last = name[0];
- count = 0;
- for (i = 1; i < length; i++) {
- if ((name[i] != 0) && (name[i] != last)){
- printf ("%d", name[i]);
- if (++count >= 3)
- break;
- }
- last = name[i];
- }
-
- for (; count < 3; count++)
- putchar ('0');
- putchar ('\n');
- }
- ----------- CUT ----------
- --
- Alan Jeddeloh W:(503) 685-2991 H:(503) 292-9740
- Tektronix, Inc.
- D/S 60-850; PO Box 1000; Wilsonville, OR 97070
- Alan.Jeddeloh@tek.com Let sleeping dogs^H^H^H^Hbabies lie
-